
use "LAD\Data\LAD.dta", clear

*Identifiers
rename lin__i id
rename fin__i fid
rename wgt2_i weight

*Demographics
rename yob__i yob
rename yod__i yod
rename age__i age
rename age__p page
gen age_calc=year-yob
rename fcmp_i famcomp
rename indfli indtype
rename psco_i postcode
rename prco_i province
rename kid1_i kidage1
rename kid2_i kidage2
rename kid3_i kidage3
rename kid4_i kidage4
rename kid5_i kidage5
rename kid6_i kidage6

*Employment, Earnings Characteristics
rename t4e__i empinc
rename t4e__p pempinc
rename txi__i taxinc
rename txi__p ptaxinc
rename aftici atinc
rename afticp patinc
rename sei__i selfempinc
rename sei__p pselfempinc
rename tirc_i ttlinc
rename tirc_p pttlinc
rename nftxci tax_fed
rename nftxcp ptax_fed
rename nptxci tax_prov
rename nptxcp ptax_prov
rename eins_i eiinc
rename eins_p peiinc
rename dues_i dues
rename dues_p pdues
replace naic1i="." if (naic1i=="NNN"|naic1i=="UUU")
destring naic1i, replace
rename naic1i naics
rename saspyi sassist
rename saspyp psassist

*Savings
rename clkgli capgain
rename clkglp pcapgain
rename invi_i invinc
rename invi_p pinvinc
rename cqppdi cqppcont
rename cqppdp pcqppcont
rename clcppi cqppcontself
rename clcppp pcqppcontself
rename tpajai penadj
rename tpajap ppenadj
rename t4rp_i rppcont
rename t4rp_p prppcont
rename rrspci rspcont
rename rrspcp prspcont
rename rsppii rsptrsf
rename rrspdi rspdlc
rename rrspli rspdln
rename t4rspi rspwd
rename t4rspp prspwd
rename hbpshi hbpshort
rename hbpshp phbpshort
rename tfsacye_i tfsamkt
rename tfsacye_p ptfsamkt
rename tfsactb_i tfsacont
rename tfsactb_p ptfsacont
rename tfsawdl_i tfsawd
rename tfsawdl_p ptfsawd

*Disability Savings
rename rdsp_i rdsp
rename rdsp_p prdsp

*Pension Income
rename oasp_i oasinc
rename oasp_p poasinc
rename oaspri oasrepay
rename oasprp poasrepay
rename oasfli gisflag
rename cqpp_i cqppben
rename cqpp_p pcqppben
rename sop4ai peninc
rename sop4ap ppeninc
rename espa_i pensplt
rename espa_p ppensplt
rename espadi penspltd
rename espadp ppenspltd

*Disability Pension Income
rename dsbcqi cqppdisben
rename dsbcqp pcqppdisben

*Disability Allowances/Exemptions
rename mdexci medexp
rename mdexcp pmedexp
rename disdni disab
rename disdnp pdisab
rename disdoi disabdep
rename disdop pdisabdep
rename disdti disabtrsf
rename disdtp pdisabtrsf

*Other
rename totdni donation
rename totdnp pdonation
rename finbli finalbal
rename finblp pfinalbal

sort id year
xtset id year

keep if year<=2011

*Selects the relevant cohort by year of birth (yob)
keep if yob>=1942 & yob<=1966
*Counts the number of times each individual appears in the sample, reports total
*For example, an individual who appears 10 times will be assigned '10' in every cell
egen freq=total(flag_i==1|flag_i==2), by(id)
*Keeps only those that are observed 90% of the time
keep if freq>=18

*Generates the marital status control
gen married=0
replace married=1 if (famcomp==1|famcomp==-1|famcomp==2|famcomp==-2|famcomp==5|famcomp==-5|famcomp==8)

*Generate the lagged pension adjustment variable
by id: gen penadjl=penadj[_n-1]

*Generate the contribution room in year t=1991-94 by backward induction
*Uses employment income to proxy for the relevant earned income measure
*See "RRSP, earned income for" for the income measure from 1995 onward
*Employment income likely understates room, which is better than overstating
by id: replace rspdlc = rspdlc[_n+1] + rspcont + penadj - 0.18*empinc if year==1994
replace rspdlc=51000 if rspdlc>51000 & year==1994
by id: replace rspdlc = rspdlc[_n+1] + rspcont + penadj - 0.18*empinc if year==1993
replace rspdlc=38500 if rspdlc>38500 & year==1993
by id: replace rspdlc = rspdlc[_n+1] + rspcont + penadj - 0.18*empinc if year==1992
replace rspdlc=26000 if rspdlc>26000 & year==1992
by id: replace rspdlc = rspdlc[_n+1] + rspcont + penadj - 0.18*empinc if year==1991
replace rspdlc=14500 if rspdlc>14500 & year==1991
replace rspdlc=0 if rspdlc<0
*Drop a few outlier observations identified by inexplicably high RRSP contribution room
egen temp=total(year==1998 & rspdlc==111811), by(id)
egen temp2=total(year==2000 & rspdlc==139330), by(id)
egen temp3=total(year==2005 & (rspdlc==211255|rspdlc==428075)), by(id)
drop if temp==1|temp2==1|temp3==1
drop temp temp2 temp3

*Selects the relevant variables to keep
keep id fid weight yob year age female married province empinc taxinc selfempinc ttlinc eiinc dues naics invinc capgain cqppcont rppcont penadj penadjl rspcont rspdlc rspwd hbpshort cqppben peninc medexp disab

*Convert NAICS code from 3-digit to 2-digit
replace naics=trunc(naics/10)

*Excludes individuals who are collecting pension income (public or private)
keep if cqppben==0 & peninc==0

save "LAD\Data\LAD_Crowdout.dta", replace

exit